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1  Introduction 


Key  problems  in  dependability  and  performance  analysis  of  highly  parallel  fault  tolerant 
computer  systems  include  the  combined  evaluation  of  performance  and  dependability  and 
the  largeness  of  the  underlying  (semi-)Markov  models.  During  the  course  of  this  grant,  we 
have  obtained  major  results  in  three  distinct  areas:  novel  solution  methods  for  large  perfor¬ 
mance  and  dependability  models,  on  the  methods  of  combined  evaluation  of  performance 
and  dependability  and  application  of  our  methods  to  practical  problems. 
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2  Large  Markov  Models 

We  have  made  considerbale  progress  in  developing  methods  that  avoid  the  generation  and 
the  solution  of  a  large  Markov  model.  The  generation  is  facilitated  by  a  compact  model 
specification  method.  Stochastic  Petri  Nets  (SPNs)  have  been  advocated  as  a  means  of 
automatically  generating  large  Markov  models.  Nevertheless,  storage  and  processing  of  a 
large  underlying  Markov  model  is  necessary. 

We  have  developed  a  new  solution  method  for  the  steady  state  analysis  of  Markovian 
stochastic  Petri  nets  that  can  often  provide  a  more  efficient  solution  over  the  standard 
solution  method  [1]. 

Besides  largeness  practical  models  are  alos  plagued  by  stiffness  of  their  generator  matri¬ 
ces.  Stiffness  causes  the  solution  time  to  be  inordinately  long.  We  have  developed  a  method 
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for  the  transient  analysis  of  stiff  Markov  chains  [8].  This  method  reduces  the  solution  time 
by  several  orders  of  magnitude. 

Sensitivity  analysis  of  continuous  time  Markov  chains  has  been  considered  recently  by 
several  researchers.  This  is  very  useful  in  performing  bottleneck  analysis  and  optimization 
on  systems  especially  during  the  design  stage.  In  [6],  we  extend  parametric  sensitivity 
analysis  to  SPN  models.  The  rates  and  probabilities  of  the  transitions  of  SPN  models  are 
defined  as  functions  of  an  independent  variable.  Equations  for  the  sensitivity  analysis  of 
steady-state  and  transient  measures  of  SPN  models  are  developed  and  implemented. 

In  order  to  avoid  construction  and  solution  of  a  large  CTMC,  in  [2],  we  have  proposed 
decomposing  the  SPN  into  a  set  of  subnets  and  separately  solving  individual  subnets.  De¬ 
pendence  among  the  subnets  requires  that,  after  solving  each  subnet,  certain  quantities  be 
exported  to  other  subnets.  A  fixed-point  iteration  is  then  used  over  the  exported  quantities. 
We  discuss  ways  of  decomposing  a  net  into  subnets,  the  type  of  quantities  that  need  to  be 
exchanged  between  subnets,  and  the  convergence  of  the  fixed-point  iterative  schemes. 

3  Composite  Performance  and  Reliability  Analysis 

The  common  approach  to  formulating  and  solving  performability  problems  is  to  use  (semi- 
)Markov  reward  models.  We  have  proposed  Stochastic  Reward  Nets  (SRNs)  to  facilitate 
convenient  specification  of  performance,  dpendability  and  composite  performance  and  de¬ 
pendability  models  [1,  9]. 

We  have  been  very  active  in  the  area  of  composite  performance  and  reliability  modeling. 
A  new  algorithm  for  the  distribution  of  accumulated  reward  in  a  Markov  reward  process 
has  been  developed  [11],  a  unified  framework  of  performance  and  reliability  analysis  of  a 
system  with  cumulative  downtime  constraint  has  been  developed  [10].  An  invited  state  of 
the  art  survey  on  the  topic  was  written  for  the  book  edited  by  H.  Takagi  [11]. 

In  the  domain  of  real-time  systems,  Shin  and  his  colleagues  have  developed  methods  of 
incorporating  the  effects  of  failures  in  a  performance  model.  On  the  other  hand,  Meyer  and 
his  colleagues  have  developed  the  framework  of  performability.  We  have  recently  managed 
to  combine  these  two  views  in  a  single  framework  [9]. 

Performability  models  normally  incorporate  throughput-like  performance  measures.  In 
many  systems,  however,  response-time  distribution  may  be  a  more  critical  index  of  perfor¬ 
mance.  Computing  response  time  distribution  for  all  but  simple  queues  has  been  hitherto 
considered  to  be  a  very  difficult  problem  (short  of  expensive  simulation).  We  have  devel¬ 
oped  a  new  method  based  on  a  “tagged  job  approach”  for  this  difficult  problem.  We  have 
further  incorporated  this  performance  measure  in  a  performability  model  [7,  9]. 

Performability  models  use  decomposition,  taking  advantage  of  different  time  scales  of 
performance-related  and  failure-related  events.  If  we  want  a  more  accurate  analysis  that 
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takes  into  account  the  loss  of  work  following  a  failure  and  checkpointing,  advanced  stochastic 
process  methods  have  to  be  used  [5].  Although,  we  have  obtained  some  success  along  these 
lines  and  solved  ‘small’  problems,  more  research  is  necessary  on  this  topic. 

4  Applications 

We  have  applied  the  techniques  we  have  developed  to  solve  interesting  problems  proposed 
by  industry.  With  GTE  Labs  scientists  we  have  worked  on  the  performance  analysis  of 
a  polling  system  and  that  of  a  vacation  queueing  system  [3,  4].  One  of  the  requirements 
that  database  systems  have  to  satisfy  is  that  the  response  time  not  exceed  a  give  threshold. 
Due  to  an  impetus  provided  by  NCR  researchers,  we  considered  the  problem  of  computing 
response  time  distribution  (as  opposed  to  just  the  mean  response  time)  [7].  Our  ideas  have 
been  applied  by  the  Software  Productivity  Consortium  in  the  performance  analysis  of  large 
Ada  designs. 
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